Skip to main content

About the Provider

Qwen is an AI model family developed by Alibaba Group, a major Chinese technology and cloud computing company. Through its Qwen initiative, Alibaba builds and open-sources advanced language, images and coding models under permissive licenses to support innovation, developer tooling, and scalable AI integration across applications.

Model Quickstart

This section helps you quickly get started with the Qwen/Qwen3-VL-8B-Instruct model on the Qubrid AI inferencing platform. To use this model, you need:
  • A valid Qubrid API key
  • Access to the Qubrid inference API
  • Basic knowledge of making API requests in your preferred language
Once authenticated with your API key, you can send inference requests to the Qwen/Qwen3-VL-8B-Instruct model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.
from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="Qwen/Qwen3-VL-8B-Instruct",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  max_tokens=2048,
  temperature=0.7,
  top_p=0.9,
  stream=True,
  presence_penalty=0
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

This will produce a response similar to the one below:
This image captures a classic and iconic view of New York City, featuring several key elements:

- **The Statue of Liberty:** Dominating the left side of the frame, the statue stands tall on Liberty Island, 
  its green patina clearly visible. She is depicted holding her torch aloft and a tablet in her other hand.

- **The New York City Skyline:** In the background, the dense and towering skyline of Manhattan stretches 
  across the horizon. Several famous skyscrapers are identifiable, including the Empire State Building 
  (with its distinctive spire) and the modern glass-and-steel towers of the Financial District.

- **The Water:** A wide expanse of the Hudson River or Upper Bay separates Liberty Island from the city. 
  The water is calm, with gentle ripples, and a few small boats or buoys can be seen.

- **The Setting:** The photograph appears to be taken during the "golden hour" – either sunrise or sunset – 
  as indicated by the warm, soft light bathing the buildings and creating a serene atmosphere.

Overall, the image presents a powerful and recognizable symbol of freedom and welcome set against the 
backdrop of one of the world's most famous and bustling metropolises.

Model Overview

Qwen3 VL 8B Instruct is a vision-language instruction-tuned model designed to understand and reason over both text and images. It supports OCR, streaming responses, and rich multimodal conversations, making it suitable for vision-language inference workflows that require text–image understanding rather than content generation. The model focuses on strong visual perception, spatial reasoning, long-context understanding, and multimodal reasoning while remaining accessible for deployment across different environments.

Model at a Glance

FeatureDetails
Model IDQwen/Qwen3-VL-8B-Instruct
ProviderAlibaba Cloud (QwenLM)
Model TypeVision-Language Instruction-Tuned Model
ArchitectureTransformer decoder-only (Qwen3-VL with ViT visual encoder)
Model Size9B
Parameters6
Context Length32K tokens
Training DataMultilingual multimodal dataset (text + images)

When to use?

Use Qwen3 VL 8B. Instruct if your inference workload requires:
  • Understanding and reasoning over images and text together
  • OCR across multiple languages with structured document understanding
  • Visual question answering and image captioning
  • Multimodal chat with streaming support
  • Spatial reasoning and visual perception without image generation needs

Inference Parameters

Parameter NameTypeDefaultDescription
StreamingbooleantrueEnable streaming responses for real-time output.
Temperaturenumber0.7Controls randomness in the output.
Max Tokensnumber2048Maximum number of tokens to generate.
Top Pnumber0.9Controls nucleus sampling.
Top Knumber50Limits sampling to the top-k tokens.
Presence Penaltynumber0Discourages repeated tokens in the output.

Key Features

  • Strong Vision-Language Capabilities: Handles text and image understanding in a unified manner
  • Multilingual OCR: Supports OCR in up to 32 languages with improved robustness
  • Long-Context & Video Understanding: Designed for extended context reasoning within the Qwen3-VL family
  • Streaming Support: Enables fast, incremental response generation
  • Advanced Spatial & Visual Reasoning: Understands object positions, layouts, and visual relationships

Summary

Qwen3 VL 8B Instruct is a vision-language inference model focused on understanding, reasoning, and interaction across text and images. It supports OCR, streaming responses, and multimodal conversations with strong visual perception and spatial reasoning. The model is suited for document analysis, visual QA, and multimodal chat scenarios. It does not perform image generation and is optimized for understanding tasks. Its Apache 2.0 license and instruction-tuned design make it suitable for accessible deployment on inference platforms.